Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ActorInit] Fix Bug in Actor creation #32277

Merged
merged 3 commits into from
Feb 13, 2023

Conversation

ijrsvt
Copy link
Contributor

@ijrsvt ijrsvt commented Feb 7, 2023

Why are these changes needed?

Related issue number

  • Noticed this by adding debug statements to the assertion on L283 to figure out why I was hitting it:
(TemporaryActor pid=1225210)   File "/home/ubuntu/anaconda3/envs/my_package/lib/python3.7/site-packages/ray/exceptions.py", line 283, in __init__
(TemporaryActor pid=1225210)     assert isinstance(cause, ActorDiedErrorContext), "Not actor creation: " + str(type(cause)) + "\n: " + str(cause)
(TemporaryActor pid=1225210) AssertionError: Not actor creation: <class 'str'>
(TemporaryActor pid=1225210) : Failed to create the actor ActorID(064094ca4d85ae3e8fc943340e000000). The failure reason is that you set the async flag, but the actor has no any coroutine function.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ijrsvt ijrsvt requested a review from rkooo567 February 7, 2023 19:24
@@ -263,14 +263,16 @@ class RayActorError(RayError):
but it is there as a safety check.
"""

def __init__(self, cause: Union[RayTaskError, ActorDiedErrorContext] = None):
def __init__(self, cause: Union[RayTaskError, ActorDiedErrorContext, str] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But it's better to fix it at this line https://github.com/ray-project/ray/blob/master/python/ray/_raylet.pyx#L778

by passing an ActorDiedErrorContext instead of adding a new type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with doing this is that the RayActorError will not have actor_init_failed=True.

I tried updating this function to return a RayTaskError instead, but then only the traceback is only printed in the str() call, not the cause.

I could do something hacky like:

raise RayActorError.from_task_error(
    RayTaskError(
        function_name=f"{class_name}.__init__",
        traceback_str=f"Failed to create the actor {core_worker}. "
                        "The failure reason is that you set the async flag, "
                        "but the actor has no any coroutine function.",
        cause=""
    )
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should set actor_init_failed=True because this flag indicates that the actor failed when executing __init__.
For our case, the actor should fail before executing __init__.

CC @rkooo567 for double check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Thanks for the clarification @jovany-wang! Will move this to ActorDiedErrorContext

@jovany-wang jovany-wang self-assigned this Feb 8, 2023
@rkooo567 rkooo567 self-assigned this Feb 8, 2023
@ijrsvt ijrsvt requested a review from jovany-wang February 8, 2023 20:03
@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 8, 2023
@ijrsvt ijrsvt removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 9, 2023
@jovany-wang jovany-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 9, 2023
Copy link
Contributor

@jovany-wang jovany-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Ian <ian.rodney@gmail.com>
This reverts commit f3ed4cbcc5310c49112aed1ffb3a302fb70cebbb.

Signed-off-by: Ian <ian.rodney@gmail.com>
Signed-off-by: Ian <ian.rodney@gmail.com>
@ijrsvt ijrsvt force-pushed the fix-ray-actor-error branch from 8152170 to ee47c76 Compare February 10, 2023 18:49
@ijrsvt
Copy link
Contributor Author

ijrsvt commented Feb 13, 2023

Failed tests are flaky/failing on master.

@ijrsvt ijrsvt merged commit 2e9b834 into ray-project:master Feb 13, 2023
@ijrsvt ijrsvt deleted the fix-ray-actor-error branch February 13, 2023 16:51
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
In ray-project#28149 RayActorError is called with a str as cause, but this is not an accepted type. This leads to hitting the assertion error in the else case: assert isinstance(cause, ActorDiedErrorContext) on L283.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants